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Here, we describe the BioMart interface to the eMouseAtlas gene expression database EMAGE. EMAGE is a spatiotemporal 
database of in situ gene expression patterns in the developing mouse embryo. BioMart provides a generic web query 
interface and programmable access using web services. The BioMart interface extends access to EMAGE via a powerful 
method of structuring complex queries and one with which users may already be familiar with from other BioMart imple- 
mentations. The interface is structured into several data sets providing the user with comprehensive query access to 
the EMAGE data. The federated nature of BioMart allows scope for integration and cross querying of EMAGE with 
other similar BioMarts. 

Database URL: http://biomart.emouseatlas.org 



Project description 

EMAGE (http://www.emouseatlas.org/emage) is a freely 
available, curated, online database of in situ gene expres- 
sion patterns in the developing mouse embryo (1-3). Gene 
expression domains extracted from raw images are inte- 
grated spatially into a set of standard 3D virtual mouse 
embryo models at different stages of development. In add- 
ition, an anatomy ontology (4) is used to describe sites of 
expression using text annotations. The EMAGE web site 
provides specialized interfaces that allow users to search 
for gene expression patterns. Uniquely, EMAGE searches 
can be made using spatially defined regions, as well as 
standard text based querying. Search results concentrate 
on providing access to images of original experiments and 
mapped expression patterns. However, the database also 
contains a considerable amount of information describing 
the experiments that were carried out to produce the gene 
expression patterns, as well as the data provenance. Much 
of this ancillary data, although visible on the detailed sub- 
mission pages, was not directly searchable in the EMAGE 



web site before the implementation of the BioMart (5) 
interface (http://biomart.emouseatlas.org). BioMart allows 
a greater range of possible queries that can include experi- 
ment details and other ancillary data. It also allows the 
possibility of integration with other BioMarts. However, 
BioMart cannot currently provide spatial searches or 
ready access to image data. The EMAGE web site and the 
BioMart interface therefore should be seen as complemen- 
tary methods of querying EMAGE data content. 

Data content 

The EMAGE data is divided into three BioMart databases. 

EMAGE browse repository BioMart database 

This database provides a summary view of the curated 
EMAGE submissions as well as additional assays that are 
not suitable for spatial mapping and have therefore 
not been included as full EMAGE submissions. It has a 
single data set. The Repository Browse data set has a 
limited number of filters and attributes providing an 
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overview of the EMAGE data. It is the equivalent of the 
'browse' and 'quick search' options on the EMAGE 
web site. 

EMAP anatomy ontology BioMart database 

This database provides a view of the standard anatomy 
ontology of mouse development (EMAP) as used in the 
EMAGE database. The anatomy ontology is constructed as 
a directed acyclic graph (DAG), with a parent and child re- 
lationship, the relationship being 'part-of, for example 
'elbow' is part-of 'arm'. The ontology is defined for each 
Theiler Stage (6) of development and for all stages com- 
bined (the abstract ontology). In the BioMart interface 
these are differentiated by the descriptions 'timed 
stages' and 'abstract'. Each node in the DAG is given a com- 
ponent name and a unique ID. The database contains 
two data sets. 

Anatomy components (component details) data 
set. This data set provides details of the components of 
the ontology. The component name, stage and ID form 
the BioMart filters for this data set as well as being default 
attributes. Further attributes include the DAG relationship 
information plus additional ontological data. 

This data set can be linked in a query to the EMAGE 
submission data set and to the EMAGE text expression 
data set. However, a query for gene expression using 
this data set will find results only for that single anat- 
omy component of the ontology. To search for expres- 
sion in a component and its children, the Anatomy 
Components (including subcomponents) data set should 
be used. 

Anatomy components (including subcomponents). 
This data set consists of the anatomy ontology components 
at each timed stage along with each of the child sub- 
components that are 'part-of that particular component. 
The filters for this data set are the component name, ID 
and Theiler stage. The attributes available are name, ID 
and component path of the parent and child term, plus 
Theiler stage. The 'component path' is the series of compo- 
nent names from the highest level parent through its des- 
cendants to the component term. 

This data set can be linked in a query to the EMAGE 
submission data set and to the EMAGE text expression 
data set. 

EMAGE gene expression BioMart database 

This database comprises the main source of EMAGE gene 
expression and ancillary data. The database contains 
four data sets. Of these, the primary data set is the 
EMAGE submission data set. The other three data sets, as 
well as the anatomy ontology data set, can be linked in a 
query to allow comprehensive search capabilities where 



this is required. Limitations in the translation of the data- 
base schema from native EMAGE to BioMart have resulted 
in this somewhat fragmented data model. Nonetheless, 
access to all the EMAGE data can be achieved either in a 
single or linked BioMart query. 

EMAGE submission data set. This data set comprises 
the bulk of data in EMAGE. An EMAGE submission gener- 
ally will be one specimen stained for the expression of one 
gene or protein at one point in embryonic development. 
Associated with each submission there may be details of 
the experiment, annotated sites of expression, the names 
and details of the data submitter or source, relevant links to 
other data, acknowledgements to data providers and/or 
references, details of the images, associated gene synonyms 
and GO terms. Many search attributes are available and 
have been grouped broadly in the categories just described. 
The filters for this data set are grouped by gene, submis- 
sion, experiment, and links. The EMAGE submission data set 
can be linked in a single query with the EMAGE experiment 
data set or the EMAGE spatial expression annotation data 
set if required. 

EMAGE experiment data set. This data set consists of 
data pertinent to the in situ experiment that produced the 
gene expression image used in a particular EMAGE submis- 
sion. This covers preparation and description of the speci- 
men used the targeted gene, detection reagent type, probe 
sequences and cell line data. Common filters and attributes 
are grouped together for convenience and certain attri- 
butes are set as default for results. The EMAGE experiment 
data set can be linked in a single query with the EMAGE 
submission data set and the EMAGE text expression anno- 
tation data set if required. 

EMAGE spatial expression data set. This data set 
consists of data pertinent to spatial gene expression data 
recorded in EMAGE. A relatively limited set of filters are 
available for this data set and the attributes grouped as 
'Mapped position attributes' are probably only useful in 
the context of the full EMAGE database where the spatial 
information can be used for querying and visualization. 
Nonetheless, an indication of spatial gene expression can 
be found using the filters provided. The most useful of the 
expression filters is probably 'strength'. EMAGE gene ex- 
pression strength is indicated with values of strong, mod- 
erate, weak, possible or not detected. The default attribute 
'EMAGE ID' is returned in the results table as a URL link 
to the submission page where full details and visualiza- 
tions of the mapped expression patterns can be seen. 
The EMAGE spatial expression data set can be linked in a 
single query with the EMAGE submission data set if 
required. 
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EMAGE text expression annotation data set This 
data set comprises the EMAGE gene expression data that 
has been annotated using text descriptions from the anat- 
omy ontology. The filters for this data set, which are 
grouped as expression text annotation filters, include 
ontology ID along with expression strength and pattern. 
Additional filters are also available, under the group head- 
ings of gene, submission and experiment filters. Attributes 
are grouped in the same way as the filters. This data set can 
be linked in a single query with the EMAGE experiment 
data set if required. This allows additional experiment at- 
tributes to be found that are not available from the text 
expression data set. This data set can also be linked in a 
query with the anatomy ontology data set in order to 
obtain more information about an ontology term such as 
its name, path and stage range. 



Query examples 



A number of example queries are given in this section. 
These examples demonstrate the use of the different 
data sets described earlier, and show how these can be 
used to provide powerful and flexible searches of the 
EMAGE data. Screenshots of each query are available as 
online Supplementary Data. 

(1) EMAGE browse repository database. 

Query #1. 'Find a summary of gene expression for Fgf 
family genes at Theiler Stage 18' 



Data set 



Filters 



Attributes 



Repository Browse Gene/Protein: fgf % 


Resource 


Theiler Stage: 18 


ID 




Gene/Protein 




Detection Reagent 




Theiler Stage 




Stage Given 




Assay 




Specimen Type 




Mutant Allele 




URL 



Querying the Repository Browse data set provides an 
overview of gene expression data available in the EMAGE 
repository as illustrated by Query 1. The query has incorpo- 
rated the wild-card symbol (%) in order to find all gene 
symbols starting with 'fgf. The attribute 'URL' provides 
a link to either the full submission description in EMAGE 
for fully curated data, or to the original image source in an 
external resource if the submission is not fully curated in 
EMAGE. 

(2) EMAP anatomy ontology database. 

Query #2. 'Find text annotated gene expression in the limb 
for the Wnt and Hox gene families'. 



Data set 



Filters 



Attributes 



Anatomy 

Components 

(including 

subcomponents) 
EMAGE text 

expression 

annotation 

data set 



Anatomy 
component 
name: limb 

MGI gene 
symbol: 
Wnt%, Hox% 



Parent component name 
Parent component path 
Child component name 
Child component path 
Strength (text annotation) 
Pattern (text annotation) 
MGI gene symbol 
Emage ID 
Theiler stage 



The Anatomy Components (including subcomponents) 
can be used to find gene expression annotation in 
EMAGE when used as a linked database query as illustrated 
by Query 2. The query will return results for the component 
chosen as a filter and all of its subcomponents. Here, the 
query term 'limb' also retrieves expression in, for example, 
'forelimb bud'. The 'Emage ID' attribute appears in the 
results as a link to the full submission description in 
EMAGE (as it does in all cases where the EMAGE ID is 
chosen as an attribute). 

(3) EMAGE gene expression database. 

Query #3. 'Find the EMAGE submissions and details of 
probes used to detect the Fgf gene family at Theiler 
stages 16-20 where the specimen type is section data 
(or unknown) and the assay quality is the highest value'. 



Data set 



Filters 



Attributes 



EMAGE MGI gene MGI gene symbol 

submission symbol: fgf % 
data set Theiler stage: 16-20 Emage ID 

Assay quality: 3 Theiler stage 
Specimen type: Staining procedure 

section, unknown 

Embedding reagent 
Clearing method 
Fixation 
Specimen type 
Assay quality 

EMAGE Detection reagent identifier 

experiment Detection reagent 

data set sequence type 

Detection reagent notes 
ISH probe generated from 
ISH probe chemistry 
ISH probe strand 
ISH probe label 

Querying the EMAGE gene expression BioMart database 
allows potentially complex sets of query conditions and 
output parameters to be defined. This provides a powerful 
tool to filter views of the gene expression data, its experi- 
mental details, and other associated data as illustrated 
by Query 3. The query is constructed to filter the EMAGE 
submission data set for gene, stage, assay quality and 



Page 3 of 5 



Original Article 



Database, Vol. 2011, Article ID bar029, doi:10.1093/database/bar029 



specimen type. The query requires to be linked with the 
EMAGE experiment data set in order to show attributes 
of the experiment and detection reagents. 

Query #4. Tor the detection reagent with the identifier 
MGI:1 334951 (a specific Fgf8 riboprobe) find the EMAGE 
submissions and the publication details of the experiment 
along with any further published references to this 
experiment'. 



Data set 


Filters 


Attributes 


EMAGE 


Detection reagent 


MGI gene symbol 


experiment 


identifier: 


Emage ID 


data set 


MGI:1 334951 


Theiler stage 

Detection reagent identifier 
Authors 


EMAGE 




Accession (reference) 


submission 




Authors 


data set 




Publication year 
Title 

Publication name 
Publication issue 
Publication volume 
Pages 



For a particular in situ hybridization probe of interest a 
user can find all of the instances of gene expression re- 
corded in EMAGE for that probe, along with references 
to the experiment in the published literature as illustrated 
by Query 4. The query is filtered in the EMAGE experiment 
data set using the detection reagent identifier. The query is 
linked with the EMAGE submission data set in order to 
show the various attributes of the publication details. 

Discussion and future direction 

The EMAGE database was conceived primarily to provide 
a unique method of collating, curating, querying and ana- 
lysing spatiotemporal patterns in the mouse embryo includ- 
ing gene expression. In order for the gene expression data 
to be provided in context, additional metadata pertaining 
to the in situ experiments is also included in the database. 
This data is curated rigorously by EMAGE editorial staff to 
provide a high quality data set. The primary focus of the 
EMAGE website is to provide tools to search for gene ex- 
pression, to display this data to users in a visual form as far 
as possible, and to enable spatial query and analysis. The 
implementation of BioMart in EMAGE now allows an add- 
itional text based interface that addresses a need for 
focussed searches across all of the text based information 
held in EMAGE. 

The data mining aspect of BioMart is an important factor 
in providing additional search functionality to EMAGE. 
However, BioMart also provides a standardized web inter- 
face, an application programming interface (API) and a 
RESTful web service. These interfaces are being increasingly 



adopted by a variety of well known biological databases 
and are described further by Haider et al. (7). This means 
that biological researchers and programmers may already 
be familiar with the BioMart interface and may prefer to 
access a database such as EMAGE using tools and tech- 
niques that they have already used with other BioMart 
implementations. 

Another important aspect of BioMart is that it allows 
federated queries between databases (5). The ability to in- 
tegrate data in this way will be the focus of future devel- 
opment of the EMAGE BioMart. A number of existing and 
proposed gene expression databases share a high degree of 
commonality with the EMAGE database. These include 
Eurexpress (8) a transcriptome-wide mouse gene expression 
study that shares the same anatomy ontology. Eurexpress 
currently has its own BioMart implementation but it is 
planned to absorb this data into EMAGE, in which case a 
unified Biomart will be provided. GUDMAP (9), a genito- 
urinary molecular anatomy project also has a common 
anatomy ontology and shares database administration 
and development with EMAGE. It is planned to provide a 
BioMart implementation for GUDMAP in the future. It may 
be possible to integrate this data into a single BioMart in- 
stance with EMAGE but more likely would be to link these 
in a federated manner. The eChickAtlas project (http:// 
www.echickatlas.org/) is a chicken embryo gene expression 
database that is currently under development. This data- 
base will use an ontology derived from, and mapped to, 
the same source as the EMAP mouse ontology. This pro- 
vides scope for interesting cross species queries to be car- 
ried out and could be implemented using the federated 
nature of BioMart. 

Integration of databases in this way should provide re- 
searchers concerned with gene expression data access to a 
large volume of information in an easily accessible and uni- 
fied manner. 

Finally, extending BioMart to spatially mapped data 
opens up the possibility of specifically spatial queries and 
filters. These could be implemented via simple # punch-out' 
type capabilities to the EMAGE graphical interfaces or via 
stored queries held in the database but will require exten- 
sion of the underlying BioMart model. If this were possible 
then it would provide a more significant level of integra- 
tion through to spatially organized data. 

Supplementary Data 

Supplementary data are available at Database Online. 
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